33 research outputs found

    Modular Anti-noise Deep Learning Network for Robotic Grasp Detection Based on RGB Images

    Full text link
    While traditional methods relies on depth sensors, the current trend leans towards utilizing cost-effective RGB images, despite their absence of depth cues. This paper introduces an interesting approach to detect grasping pose from a single RGB image. To this end, we propose a modular learning network augmented with grasp detection and semantic segmentation, tailored for robots equipped with parallel-plate grippers. Our network not only identifies graspable objects but also fuses prior grasp analyses with semantic segmentation, thereby boosting grasp detection precision. Significantly, our design exhibits resilience, adeptly handling blurred and noisy visuals. Key contributions encompass a trainable network for grasp detection from RGB images, a modular design facilitating feasible grasp implementation, and an architecture robust against common image distortions. We demonstrate the feasibility and accuracy of our proposed approach through practical experiments and evaluations

    Dynamics of Vocalization-Induced Modulation of Auditory Cortical Activity at Mid-utterance

    Get PDF
    Background: Recent research has addressed the suppression of cortical sensory responses to altered auditory feedback that occurs at utterance onset regarding speech. However, there is reason to assume that the mechanisms underlying sensorimotor processing at mid-utterance are different than those involved in sensorimotor control at utterance onset. The present study attempted to examine the dynamics of event-related potentials (ERPs) to different acoustic versions of auditory feedback at mid-utterance. Methodology/Principal findings: Subjects produced a vowel sound while hearing their pitch-shifted voice (100 cents), a sum of their vocalization and pure tones, or a sum of their vocalization and white noise at mid-utterance via headphones. Subjects also passively listened to playback of what they heard during active vocalization. Cortical ERPs were recorded in response to different acoustic versions of feedback changes during both active vocalization and passive listening. The results showed that, relative to passive listening, active vocalization yielded enhanced P2 responses to the 100 cents pitch shifts, whereas suppression effects of P2 responses were observed when voice auditory feedback was distorted by pure tones or white noise. Conclusion/Significance: The present findings, for the first time, demonstrate a dynamic modulation of cortical activity as a function of the quality of acoustic feedback at mid-utterance, suggesting that auditory cortical responses can be enhanced or suppressed to distinguish self-produced speech from externally-produced sounds

    Transfer Effect of Speech-sound Learning on Auditory-motor Processing of Perceived Vocal Pitch Errors

    Get PDF
    Speech perception and production are intimately linked. There is evidence that speech motor learning results in changes to auditory processing of speech. Whether speech motor control benefits from perceptual learning in speech, however, remains unclear. This event-related potential study investigated whether speech-sound learning can modulate the processing of feedback errors during vocal pitch regulation. Mandarin speakers were trained to perceive five Thai lexical tones while learning to associate pictures with spoken words over 5 days. Before and after training, participants produced sustained vowel sounds while they heard their vocal pitch feedback unexpectedly perturbed. As compared to the pre-training session, the magnitude of vocal compensation significantly decreased for the control group, but remained consistent for the trained group at the post-training session. However, the trained group had smaller and faster N1 responses to pitch perturbations and exhibited enhanced P2 responses that correlated significantly with their learning performance. These findings indicate that the cortical processing of vocal pitch regulation can be shaped by learning new speech-sound associations, suggesting that perceptual learning in speech can produce transfer effects to facilitating the neural mechanisms underlying the online monitoring of auditory feedback regarding vocal production

    Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization

    Full text link
    Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. Hence, the strong performance of PLMs may rely on the parametric knowledge that is memorized during pre-training and fine-tuning. Moreover, the knowledge memorized by PLMs may quickly become outdated, which affects the generalization performance of PLMs on future data. In this work, we propose TempoSum, a novel benchmark that contains data samples from 2010 to 2022, to understand the temporal generalization ability of abstractive summarization models. Through extensive human evaluation, we show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data. Moreover, existing faithfulness enhancement methods cannot reliably improve the faithfulness of summarization models on future data. Finally, we discuss several recommendations to the research community on how to evaluate and improve the temporal generalization capability of text summarization models.Comment: Accepted at EMNLP 202

    Spatiotemporal distribution and dynamics evolution of artificial intelligence development in China

    No full text
    The quantified measurement and comprehensive analysis of artificial intelligence development (AIDEV) are vital for countries to form AI industrial ecology and promote the long-term development of regional AI technology. Based on the innovation ecosystems (IE) theory, this paper constructs an evaluation system to measure and analyze the spatiotemporal distribution and dynamic evolution of the AIDEV in China from 2011 to 2020. The results show that the AIDEV of China presents an overall upward trend and an obvious unbalance in the spatial distribution which is “eastern > central > western”. Meanwhile, the provinces of low-level AIDEV are catching up with the high-level provinces, which leads to the regional difference of AIDEV narrowing. Moreover, the concentration and polarization phenomenon of AIDEV in China has been weakening and the AIDEV will continue to increase in the next three years. Further, there is a significantly positive spatial autocorrelation of AIDEV. Finally, high AIDEV provinces will increase the probability of surrounding provinces’ AIDEV to develop. This paper expands the research stream in the field of AI research, extends the application scenarios of IE theory, and puts forward some relevant policy recommendations

    Self-Attentive Generative Adversarial Network for Cloud Detection in High Resolution Remote Sensing Images

    No full text

    Optically pumped Milliwatt Whispering-Gallery microcavity laser

    No full text
    Abstract Whispering-gallery-mode microcavity lasers possess remarkable characteristics such as high Q factors and compact geometries, making them an essential element in the evolution of microlasers. However, solid-state whispering-gallery-mode lasers have previously suffered from low output power and limited optical conversion efficiency, hindering their applications. Here, we present the achievement of milliwatt laser emissions at a wavelength of 1.06 µm from a solid-state whispering-gallery-mode laser. To accomplish this, we construct a whispering-gallery-mode microcavity (with a diameter of 30 µm) using a crystalline Nd: YAG thin film obtained through carbon-implantation enhanced etching of a Nd: YAG crystal. This microcavity laser demonstrates a maximum output power of 1.12 mW and an optical conversion efficiency of 12.4%. Moreover, our unique eccentric microcavity design enables efficient coupling of free-space pump light, facilitating integration with a waveguide. This integration allowed for single-wavelength laser emission from the waveguide, achieving an output power of 0.5 mW and an optical conversion efficiency of 6.18%. Our work opens up new possibilities for advancing solid-state whispering-gallery-mode lasers, providing a viable option for compact photonic sources

    Deep learning based thin cloud removal fusing vegetation red edge and short wave infrared spectral information for sentinel-2A imagery

    Get PDF
    Thin clouds seriously affect the availability of optical remote sensing images, especially in visible bands. Short-wave infrared (SWIR) bands are less influenced by thin clouds, but usually have lower spatial resolution than visible (Vis) bands in high spatial resolution remote sensing images (e.g., in Sentinel-2A/B, CBERS04, ZY-1 02D and HJ-1B satellites). Most cloud removal methods do not take advantage of the spectral information available in SWIR bands, which are less affected by clouds, to restore the background information tainted by thin clouds in Vis bands. In this paper, we propose CR-MSS, a novel deep learning-based thin cloud removal method that takes the SWIR and vegetation red edge (VRE) bands as inputs in addition to visible/near infrared (Vis/NIR) bands, in order to improve cloud removal in Sentinel-2 visible bands. Contrary to some traditional and deep learning-based cloud removal methods, which use manually designed rescaling algorithm to handle bands at different resolutions, CR-MSS uses convolutional layers to automatically process bands at different resolution. CR-MSS has two input/output branches that are designed to process Vis/NIR and VRE/SWIR, respectively. Firstly, Vis/NIR cloudy bands are down-sampled by a convolutional layer to low spatial resolution features, which are then concatenated with the corresponding features extracted from VRE/SWIR bands. Secondly, the concatenated features are put into a fusion tunnel to down-sample and fuse the spectral information from Vis/NIR and VRE/SWIR bands. Third, a decomposition tunnel is designed to up-sample and decompose the fused features. Finally, a transpose convolutional layer is used to up-sample the feature maps to the resolution of input Vis/NIR bands. CR-MSS was trained on 28 real Sentinel-2A image pairs over the globe, and tested separately on eight real cloud image pairs and eight simulated cloud image pairs. The average SSIM values (Structural Similarity Index Measurement) for CR-MSS results on Vis/NIR bands over all testing images were 0.69, 0.71, 0.77, and 0.81, respectively, which was on average 1.74% higher than the best baseline method. The visual results on real Sentinel-2 images demonstrate that CR-MSS can produce more realistic cloud and cloud shadow removal results than baseline methods
    corecore